You’re not just learning the statistical concepts in this course, but how to produce the statistics. Analyzing data requires learning to use new technology.
Learning statistical software to analyze data can be really fun. You get to learn about real world social problems!
But, it can also be frustrating. There’s even a bingo card of common errors (i.e. bugs) that new statistical programmers will expect to experience.
Calculating the statistics by hand quickly gets cumbersome, time consuming, and difficult.
Good social science is built on replication.
Replication using technology requires researchers to use sometimes unfamiliar software, working on devices with unique environments and settings.
When it feels like the technology is preventing you from getting to the course content, take a deep breath, and remember that building your technology skills is part of this course.
Learning to use statistical software necessitates grappling.
“Grappling is like perseverance, but it goes beyond that. Perseverance means trying again and again, even after you’ve failed. Grappling implies trying even before you fail the first time.
It’s thinking, “First, I’ll work with it independently. Okay, I’m really not understanding it. Let me go back to my notes. Okay, I have solved for the first part of it. Now I have the second part of it. Okay, I got the question wrong; let me try again. Maybe I can ask my peer now.”
Grappling is working hard to make sure you understand the problem fully, and then using every resource at your fingertips to solve it.”
Most statistical analyses happen not because the person is a math genius, but because they persisted through the minefield of technical issues by being excellent problem-solvers.
It is a misperception that the best statistical analysts sit down at their computers and type code from memory.
Much of process of coding is copying code from somewhere else and modifying it to fit your particular situation.
When none of these strategies fix the issue, it is time to ask for help.
The same replication principles should be used when asking for help.
Make someone else to feel your pain!
When asking for help, do what you can to create a reproducible example.
Search for answers before posting your question.
Describe the problem.
“It doesn’t work” isn’t descriptive enough.
Describe your environment.
What operating system are you using? Which R version? What packages? Dataset?
Describe the solution.
Confirm if a solution offered works. Or, if you solve it on your own, post how you solved it.
In this class, the fastest way to get help on Lab Assignments is during class time and lab sessions.
The second fastest way is to post your reproducible example on the class discussion board.
Open RStudio, then click the dropdown arrow next to the “New File” icon, and then “R script.”
When your script is open, you’ll see four key regions or “panes” in the interface:
Source pane: where you can edit and save R scripts or author computational documents like Quarto and R Markdown.
Console pane: is used to write short interactive R commands.
Environment pane: displays temporary R objects created during that R session.
Output pane: displays the plots, tables, or HTML outputs of executed code along with files saved to disk.
There are many file types, but these are key to an R & RStudio workflow (and likely new to you):
| Extension | Description |
|---|---|
| .R | R scripts store a sequence of R commands (code) that can be run all at once or line by line. |
| .qmd | Quarto Markdown creates reproducible documents that contain a combination of text, code, and output. |
| .Rdata (or sometimes .rda) | These store and load R objects—like data frames. |
| .Rproj | RStudio project file (keeps project settings). |
R comes with basic tools, but packages extend the capabilities of base R (what you already installed). An R package is like a toolbox: a collection of functions, data, and documentation that help you do specific tasks using R.
You’ll install each package (only once per system):
You’ll load each package (every time you use it):
The guiding principle for workflow.
A workflow of data analysis is a process for managing all aspects of data analysis.
Planning, documenting, and organizing your work; cleaning the data; creating, renaming, and verifying variables; performing and presenting statistical analyses; producing replicable results; and archiving what you have done are all integral parts of your workflow.
| Set up | Systematic organization of the project and project files. |
| Familiarize self with data | Skipping takes more time in the long run. |
| Process data | Takes the MOST time. |
| Running analyses | What people THINK takes the most time. |
| Presenting results | What people (wrongly) think does not take time. |
RStudio projects give you tools for a an organized and reproducible workflow.
Create an RStudio project for each data analysis project. Everything you need is in one place, and cleanly separated from all the other projects that you are working on.
To create a new project in RStudio, use the File > New Project command.
In the New Project wizard that pops up, select New Directory, then New Project.
Name the project “SOC6302” and then click the Create Project button.
This will launch you into a new RStudio Project inside a new folder called “SOC6302”.
Adopting a project-based workflow avoids changing file paths.
ABSOLUTE FILE PATHS
Department of Sociology
Unit 17100, 17th Floor, Ontario Power Building
700 University Ave., Toronto, ON M5G 1Z5
C:\Users\Pepin\GitHub\SOC6302\scripts
RELATIVE FILE PATHS
Take the left side elevators to the 17th floor.
Go through the double doors and a take a right.
First door on your left.
here(scripts)
here()here:here() # set the file path to the root of the project
should be
session_info() # software documentation
Quarto: The tool you’ll use to create reproducible computational documents. Every piece of assignment you hand in will be a Quarto document.
Note
You are likely familiar with word processors like MS Word or Google Docs. We will not be using these in this class. Instead, the words you would write in such a document, as well as your R code, will go into a Quarto document. You will render the document (more on what this means later) to get a document out that has your words, code, and the output of that code. Everything in one place, beautifully formatted!
Clear the memory at every restart of RStudio by turning off the automatic saving of your workspace and .Rdata files with you quit RStudio. This is important for reproducibility, debugging, and avoiding littering your computer with unnecessary files.
Set this via: